Time Series Statistics

This module contains functions for statistical operations on time series data.

timeseries_stats.auto_corr(data, max_lag)

Calculate autocorrelation of time series for range of lag values up to max_lag.

Uses pandas.Series.autocorr() to calculate autocorrelation for a single column of data (i.e. a pandas.Series), for a range of values up to max_lag

Parameters
  • data – Time series data as a pandas Series.

  • max_lag – Index of maximum time lag to calculate autocorrelation.

Returns

DataFrame with lags column and autocorrelation value at given lag.

timeseries_stats.corr(data1, data2, max_lag)

Calculate correlation of two time series for a range of lags between them.

Uses pandas.Series.corr() to calculate correlation between two columns of data (i.e. a pandas.Series), with data2 shifted relative to data1 by a range of lags up to max_lag.

Parameters
  • data1 – Time series data as a pandas Series.

  • data2 – Time series data as a pandas Series. This is the series that is shifted relative to data1.

  • max_lag – Index of maximum time lag to calculate correlation.

Returns

DataFrame with lags column and correlation value at given lag.

timeseries_stats.remove_seasonality(ts, T)

Remove periodic repetition of period T from time series data.

Uses differencing methods to compare equivalent points in different periods, e.g. signal = data_[i] - data_[i-T] Note that this reduces duration of time series by T. If more than one column of data in ts, returns deseasonalised time series for each column.

Parameters
  • ts – Time series data as a pandas DataFrame.

  • T – Period of seasonality to be removed.

Return ts_diff

DataFrame with same columns as ts but data columns are now deseasonalised, and time column is correspondingly shorter.

timeseries_stats.remove_trend(ts, N)

Remove a best fitting polynomial of degree N from time series data.

Uses numpy methods polyfit to find the coefficients of a degree N polynomial of best fit (least squares resiuduals) and polyeval to construct the polynomial over the duration of the time series. If more than one column of data in ts, returns trend and detrended data for each data set.

Parameters
  • ts – Time series data as a pandas dataframe.

  • N – Degree of polynomial trend to remove.

Return ts_detrended

timeseries composed of time column, and two output result columns per input data column; fit_<data_col> is Array of values of the best fitting polynomial at each time; detrended_<data_col> is original data, with trend fit subtracted

timeseries_stats.rolling_mean(ts, window)

Calculate rolling mean of time series.

Uses pandas.DataFrame.rolling() to calculate rolling mean of a given window size. If more than one column of data in ts, returns rolling mean using given window size for each column of data. Returns nans for times before first window.

Parameters
  • ts – Time series data as a pandas DataFrame.

  • window – Window size over which to calculate mean (int).

Return ts_std

DataFrame with same columns as ts but with rolling mean in place of data column.

timeseries_stats.rolling_std(ts, window)

Calculate rolling standard deviation of time series.

Uses pandas.DataFrame.rolling() to calculate rolling std dev of a given window size. If more than one column of data in ts, returns rolling std dev using given window size for each column of data. Returns nans for times before first window.

Parameters
  • ts – Time series data as a pandas DataFrame.

  • window – Window size over which to calculate std dev (int).

Return ts_std

DataFrame with same columns as ts but with rolling std dev in place of data column.